N-tuple Zipf Analysis and Modeling for Language, Computer Program and DNA

نویسندگان

  • Xiaocong Gan
  • Dahui Wang
  • Zhangang Han
چکیده

n-tuple power law widely exists in language, computer program code, DNA and music. After a vast amount of Zipf analyses of n-tuple power law from empirical data, we propose a model to explain the n-tuple power law feature existed in these information translational carriers. Our model is a preferential selection approach inspired by Simon’s model which explained scaling law of single symbol in a sequence Zipf analysis. The kernel mechanism is neat and simple in our model. It can be simply described as a randomly copy and paste process, that is, randomly select a random segment from current sequence and attach it to the end repeatedly. The simulation of our model shows that n-tuple power law exists in model generated data. Furthermore, two estimation equations: the Zipf exponent and the minimal length of n-tuple for power law appears all correspond to empirical data well. Our model can also reproduce the symmetry breaking process of ATGC number differences in DNA data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Possible Origin of Power-Law Behavior in n-Tuple Zipf Analysis

In n-tuple Zipf analysis, ‘‘words’’ are defined as strings of n digits, and their normalized frequency of occurrence v is measured for a given ‘‘text’’ ~sequence of digits!. In the case of various non-Markovian sequences, the probability density of the frequencies P(v) has a power-law tail. Here we argue that a broad class of unbiased binary texts exhibiting a nonexponential distribution of clu...

متن کامل

Comment on "Linguistic features of noncoding DNA sequences"

In a recent letter [1], Mantegna et. al. report that certain statistical signatures of natural language can be found in non-coding DNA sequences. The vast majority of DNA in higher organisms including humans consists of non-coding sequences whose function , if any, is unknown. Hence this new analysis is quite important. It suggests, as the authors concluded , " the possible existence of one (or...

متن کامل

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...

متن کامل

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...

متن کامل

Modeling and Evaluation of Stochastic Discrete-Event Systems with RayLang Formalism

In recent years, formal methods have been used as an important tool for performance evaluation and verification of a wide range of systems. In the view points of engineers and practitioners, however, there are still some major difficulties in using formal methods. In this paper, we introduce a new formal modeling language to fill the gaps between object-oriented programming languages (OOPLs) us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009